Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bull Math Biol ; 85(11): 107, 2023 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-37749280

RESUMO

Early literature on genome rearrangement modelling views the problem of computing evolutionary distances as an inherently combinatorial one. In particular, attention is given to estimating distances using the minimum number of events required to transform one genome into another. In hindsight, this approach is analogous to early methods for inferring phylogenetic trees from DNA sequences such as maximum parsimony-both are motivated by the principle that the true distance minimises evolutionary change, and both are effective if this principle is a true reflection of reality. Recent literature considers genome rearrangement under statistical models, continuing this parallel with DNA-based methods, with the goal of using model-based methods (for example maximum likelihood techniques) to compute distance estimates that incorporate the large number of rearrangement paths that can transform one genome into another. Crucially, this approach requires one to decide upon a set of feasible rearrangement events and, in this paper, we focus on characterising well-motivated models for signed, uni-chromosomal circular genomes, where the number of regions remains fixed. Since rearrangements are often mathematically described using permutations, we isolate the sets of permutations representing rearrangements that are biologically reasonable in this context, for example inversions and transpositions. We provide precise mathematical expressions for these rearrangements, and then describe them in terms of the set of cuts made in the genome when they are applied. We directly compare cuts to breakpoints, and use this concept to count the distinct rearrangement actions which apply a given number of cuts. Finally, we provide some examples of rearrangement models, and include a discussion of some questions that arise when defining plausible models.


Assuntos
Rearranjo Gênico , Conceitos Matemáticos , Filogenia , Modelos Biológicos , Genoma , Algoritmos , Modelos Genéticos
2.
Bull Math Biol ; 85(3): 19, 2023 01 30.
Artigo em Inglês | MEDLINE | ID: mdl-36715842

RESUMO

The algebraic properties of flattenings and subflattenings provide direct methods for identifying edges in the true phylogeny-and by extension the complete tree-using pattern counts from a sequence alignment. The relatively small number of possible internal edges among a set of taxa (compared to the number of binary trees) makes these methods attractive; however, more could be done to evaluate their effectiveness for inferring phylogenetic trees. This is the case particularly for subflattenings, and the work we present here makes progress in this area. We introduce software for constructing and evaluating subflattenings for splits, utilising a number of methods to make computing subflattenings more tractable. We then present the results of simulations we have performed in order to compare the effectiveness of subflattenings to that of flattenings in terms of split score distributions, and susceptibility to possible biases. We find that subflattenings perform similarly to flattenings in terms of the distribution of split scores on the trees we examined, but may be less affected by bias arising from both split size/balance and long branch attraction. These insights are useful for developing effective algorithms to utilise these tools for the purpose of inferring phylogenetic trees.


Assuntos
Conceitos Matemáticos , Modelos Biológicos , Filogenia , Software , Algoritmos
3.
J Math Biol ; 84(6): 49, 2022 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-35508785

RESUMO

We present a unified framework for modelling genomes and their rearrangements in a genome algebra, as elements that simultaneously incorporate all physical symmetries. Building on previous work utilising the group algebra of the symmetric group, we explicitly construct the genome algebra for the case of unsigned circular genomes with dihedral symmetry and show that the maximum likelihood estimate (MLE) of genome rearrangement distance can be validly and more efficiently performed in this setting. We then construct the genome algebra for a more general case, that is, for genomes that may be represented by elements of an arbitrary group and symmetry group, and show that the MLE computations can be performed entirely within this framework. There is no prescribed model in this framework; that is, it allows any choice of rearrangements that preserve the set of regions, along with arbitrary weights. Further, since the likelihood function is built from path probabilities-a generalisation of path counts-the framework may be utilised for any distance measure that is based on path probabilities.


Assuntos
Genoma , Modelos Genéticos , Algoritmos , Rearranjo Gênico , Funções Verossimilhança
4.
J Bioinform Comput Biol ; 19(6): 2140015, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34806949

RESUMO

Of the many modern approaches to calculating evolutionary distance via models of genome rearrangement, most are tied to a particular set of genomic modeling assumptions and to a restricted class of allowed rearrangements. The "position paradigm", in which genomes are represented as permutations signifying the position (and orientation) of each region, enables a refined model-based approach, where one can select biologically plausible rearrangements and assign to them relative probabilities/costs. Here, one must further incorporate any underlying structural symmetry of the genomes into the calculations and ensure that this symmetry is reflected in the model. In our recently-introduced framework of genome algebras, each genome corresponds to an element that simultaneously incorporates all of its inherent physical symmetries. The representation theory of these algebras then provides a natural model of evolution via rearrangement as a Markov chain. Whilst the implementation of this framework to calculate distances for genomes with "practical" numbers of regions is currently computationally infeasible, we consider it to be a significant theoretical advance: one can incorporate different genomic modeling assumptions, calculate various genomic distances, and compare the results under different rearrangement models. The aim of this paper is to demonstrate some of these features.


Assuntos
Genoma , Genômica , Evolução Molecular , Rearranjo Gênico , Modelos Genéticos , Probabilidade
6.
J Math Biol ; 81(2): 549-573, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32710155

RESUMO

A matrix Lie algebra is a linear space of matrices closed under the operation [Formula: see text]. The "Lie closure" of a set of matrices is the smallest matrix Lie algebra which contains the set. In the context of Markov chain theory, if a set of rate matrices form a Lie algebra, their corresponding Markov matrices are closed under matrix multiplication; this has been found to be a useful property in phylogenetics. Inspired by previous research involving Lie closures of DNA models, it was hypothesised that finding the Lie closure of a codon model could help to solve the problem of mis-estimation of the non-synonymous/synonymous rate ratio, [Formula: see text]. We propose two different methods of finding a linear space from a model: the first is the linear closure which is the smallest linear space which contains the model, and the second is the linear version which changes multiplicative constraints in the model to additive ones. For each of these linear spaces we then find the Lie closures of them. Under both methods, it was found that closed codon models would require thousands of parameters, and that any partial solution to this problem that was of a reasonable size violated stochasticity. Investigation of toy models indicated that finding the Lie closure of matrix linear spaces which deviated only slightly from a simple model resulted in a Lie closure that was close to having the maximum number of parameters possible. Given that Lie closures are not practical, we propose further consideration of the two variants of linearly closed models.


Assuntos
Códon , DNA , Modelos Biológicos , Cadeias de Markov , Filogenia
7.
J Mol Evol ; 88(2): 136-150, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31781936

RESUMO

The underlying structure of the canonical amino acid substitution matrix (aaSM) is examined by considering stepwise improvements in the differential recognition of amino acids according to their chemical properties during the branching history of the two aminoacyl-tRNA synthetase (aaRS) superfamilies. The evolutionary expansion of the genetic code is described by a simple parameterization of the aaSM, in which (i) the number of distinguishable amino acid types, (ii) the matrix dimension and (iii) the number of parameters, each increases by one for each bifurcation in an aaRS phylogeny. Parameterized matrices corresponding to trees in which the size of an amino acid sidechain is the only discernible property behind its categorization as a substrate, exclusively for a Class I or II aaRS, provide a significantly better fit to empirically determined aaSM than trees with random bifurcation patterns. A second split between polar and nonpolar amino acids in each Class effects a vastly greater further improvement. The earliest Class-separated epochs in the phylogenies of the aaRS reflect these enzymes' capability to distinguish tRNAs through the recognition of acceptor stem identity elements via the minor (Class I) and major (Class II) helical grooves, which is how the ancient operational code functioned. The advent of tRNA recognition using the anticodon loop supports the evolution of the optimal map of amino acid chemistry found in the later genetic code, an essentially digital categorization, in which polarity is the major functional property, compensating for the unrefined, haphazard differentiation of amino acids achieved by the operational code.


Assuntos
Substituição de Aminoácidos , Aminoacil-tRNA Sintetases/genética , Código Genético , Filogenia , Aminoácidos/genética , Anticódon , Evolução Molecular , Modelos Genéticos
8.
Bull Math Biol ; 81(2): 361-383, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30073568

RESUMO

We present and explore a general method for deriving a Lie-Markov model from a finite semigroup. If the degree of the semigroup is k, the resulting model is a continuous-time Markov chain on k-states and, as a consequence of the product rule in the semigroup, satisfies the property of multiplicative closure. This means that the product of any two probability substitution matrices taken from the model produces another substitution matrix also in the model. We show that our construction is a natural generalization of the concept of group-based models.


Assuntos
Cadeias de Markov , Filogenia , Biologia Computacional , Evolução Molecular , Conceitos Matemáticos , Modelos Genéticos , Modelos Estatísticos , Processos Estocásticos
9.
Bull Math Biol ; 81(2): 535-567, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30264286

RESUMO

The calculation of evolutionary distance via models of genome rearrangement has an inherent combinatorial complexity. Various algorithms and estimators have been used to address this; however, many of these set quite specific conditions for the underlying model. A recently proposed technique, applying representation theory to calculate evolutionary distance between circular genomes as a maximum likelihood estimate, reduces the computational load by converting the combinatorial problem into a numerical one. We show that the technique may be applied to models with any choice of rearrangements and relative probabilities thereof; we then investigate the symmetry of circular genome rearrangement models in general. We discuss the practical implementation of the technique and, without introducing any bona fide numerical approximations, give the results of some initial calculations for genomes with up to 11 regions.


Assuntos
Rearranjo Gênico , Modelos Genéticos , Filogenia , Algoritmos , Animais , Biologia Computacional , Evolução Molecular , Genoma , Funções Verossimilhança , Conceitos Matemáticos , Modelos Estatísticos , Probabilidade
10.
Syst Biol ; 67(5): 905-915, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-29788496

RESUMO

We give a non-technical introduction to convergence-divergence models, a new modeling approach for phylogenetic data that allows for the usual divergence of lineages after lineage-splitting but also allows for taxa to converge, i.e. become more similar over time. By examining the $3$-taxon case in some detail, we illustrate that phylogeneticists have been "spoiled" in the sense of not having to think about the structural parameters in their models by virtue of the strong assumption that evolution is tree-like. We show that there are not always good statistical reasons to prefer the usual class of tree-like models over more general convergence-divergence models. Specifically, we show many $3$-taxon data sets can be equally well explained by supposing violation of the molecular clock due to change in the rate of evolution along different edges, or by keeping the assumption of a constant rate of evolution but instead assuming that evolution is not a purely divergent process. Given the abundance of evidence that evolution is not strictly tree-like, our discussion is an illustration that as phylogeneticists we need to think clearly about the structural form of the models we use. For cases with four taxa, we show that there will be far greater ability to distinguish models with convergence from non-clock-like tree models. [Akaike information criterion; convergence-divergence models; distinguishability; identifiability; likelihood; molecular clock; phylogeny.].


Assuntos
Evolução Molecular , Modelos Genéticos , Filogenia , Evolução Biológica
11.
J Theor Biol ; 423: 31-40, 2017 06 21.
Artigo em Inglês | MEDLINE | ID: mdl-28435014

RESUMO

Accurate estimation of evolutionary distances between taxa is important for many phylogenetic reconstruction methods. Distances can be estimated using a range of different evolutionary models, from single nucleotide polymorphisms to large-scale genome rearrangements. Corresponding corrections for genome rearrangement distances fall into 3 categories: Empirical computational studies, Bayesian/MCMC approaches, and combinatorial approaches. Here, we introduce a maximum likelihood estimator for the inversion distance between a pair of genomes, using a group-theoretic approach to modelling inversions introduced recently. This MLE functions as a corrected distance: in particular, we show that because of the way sequences of inversions interact with each other, it is quite possible for minimal distance and MLE distance to differently order the distances of two genomes from a third. The second aspect tackles the problem of accounting for the symmetries of circular arrangements. While, generally, a frame of reference is locked, and all computation made accordingly, this work incorporates the action of the dihedral group so that distance estimates are free from any a priori frame of reference. The philosophy of accounting for symmetries can be applied to any existing correction method, for which examples are offered.


Assuntos
Evolução Molecular , Genoma/genética , Filogenia , Funções Verossimilhança , Análise Espacial
12.
J Math Biol ; 75(6-7): 1619-1654, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-28434023

RESUMO

Recently there has been renewed interest in phylogenetic inference methods based on phylogenetic invariants, alongside the related Markov invariants. Broadly speaking, both these approaches give rise to polynomial functions of sequence site patterns that, in expectation value, either vanish for particular evolutionary trees (in the case of phylogenetic invariants) or have well understood transformation properties (in the case of Markov invariants). While both approaches have been valued for their intrinsic mathematical interest, it is not clear how they relate to each other, and to what extent they can be used as practical tools for inference of phylogenetic trees. In this paper, by focusing on the special case of binary sequence data and quartets of taxa, we are able to view these two different polynomial-based approaches within a common framework. To motivate the discussion, we present three desirable statistical properties that we argue any invariant-based phylogenetic method should satisfy: (1) sensible behaviour under reordering of input sequences; (2) stability as the taxa evolve independently according to a Markov process; and (3) explicit dependence on the assumption of a continuous-time process. Motivated by these statistical properties, we develop and explore several new phylogenetic inference methods. In particular, we develop a statistically bias-corrected version of the Markov invariants approach which satisfies all three properties. We also extend previous work by showing that the phylogenetic invariants can be implemented in such a way as to satisfy property (3). A simulation study shows that, in comparison to other methods, our new proposed approach based on bias-corrected Markov invariants is extremely powerful for phylogenetic inference. The binary case is of particular theoretical interest as-in this case only-the Markov invariants can be expressed as linear combinations of the phylogenetic invariants. A wider implication of this is that, for models with more than two states-for example DNA sequence alignments with four-state models-we find that methods which rely on phylogenetic invariants are incapable of satisfying all three of the stated statistical properties. This is because in these cases the relevant Markov invariants belong to a class of polynomials independent from the phylogenetic invariants.


Assuntos
Filogenia , Bioestatística/métodos , Simulação por Computador , DNA/genética , Evolução Molecular , Cadeias de Markov , Conceitos Matemáticos , Modelos Genéticos , Alinhamento de Sequência
13.
Bull Math Biol ; 79(3): 619-634, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-28188429

RESUMO

We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.


Assuntos
Modelos Genéticos , Filogenia , Evolução Biológica , Cadeias de Markov , Conceitos Matemáticos
14.
J Math Biol ; 73(2): 259-82, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-26660305

RESUMO

We consider the continuous-time presentation of the strand symmetric phylogenetic substitution model (in which rate parameters are unchanged under nucleotide permutations given by Watson-Crick base conjugation). Algebraic analysis of the model's underlying structure as a matrix group leads to a change of basis where the rate generator matrix is given by a two-part block decomposition. We apply representation theoretic techniques and, for any (fixed) number of phylogenetic taxa L and polynomial degree D of interest, provide the means to classify and enumerate the associated Markov invariants. In particular, in the quadratic and cubic cases we prove there are precisely [Formula: see text] and [Formula: see text] linearly independent Markov invariants, respectively. Additionally, we give the explicit polynomial forms of the Markov invariants for (i) the quadratic case with any number of taxa L, and (ii) the cubic case in the special case of a three-taxon phylogenetic tree. We close by showing our results are of practical interest since the quadratic Markov invariants provide independent estimates of phylogenetic distances based on (i) substitution rates within Watson-Crick conjugate pairs, and (ii) substitution rates across conjugate base pairs.


Assuntos
Classificação/métodos , Modelos Genéticos , Filogenia , Algoritmos
15.
Syst Biol ; 64(4): 638-50, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-25858352

RESUMO

When the process underlying DNA substitutions varies across evolutionary history, some standard Markov models underlying phylogenetic methods are mathematically inconsistent. The most prominent example is the general time-reversible model (GTR) together with some, but not all, of its submodels. To rectify this deficiency, nonhomogeneous Lie Markov models have been identified as the class of models that are consistent in the face of a changing process of DNA substitutions regardless of taxon sampling. Some well-known models in popular use are within this class, but are either overly simplistic (e.g., the Kimura two-parameter model) or overly complex (the general Markov model). On a diverse set of biological data sets, we test a hierarchy of Lie Markov models spanning the full range of parameter richness. Compared against the benchmark of the ever-popular GTR model, we find that as a whole the Lie Markov models perform well, with the best performing models having 8-10 parameters and the ability to recognize the distinction between purines and pyrimidines.


Assuntos
Classificação/métodos , Modelos Biológicos , Filogenia , Animais , DNA/química , DNA/genética , DNA Mitocondrial/química , DNA Mitocondrial/genética , Humanos , Nucleotídeos/genética , Nucleotídeos/metabolismo , Plantas/genética
16.
J Math Biol ; 70(4): 855-91, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24723068

RESUMO

Continuous-time Markov chains are a standard tool in phylogenetic inference. If homogeneity is assumed, the chain is formulated by specifying time-independent rates of substitutions between states in the chain. In applications, there are usually extra constraints on the rates, depending on the situation. If a model is formulated in this way, it is possible to generalise it and allow for an inhomogeneous process, with time-dependent rates satisfying the same constraints. It is then useful to require that, under some time restrictions, there exists a homogeneous average of this inhomogeneous process within the same model. This leads to the definition of "Lie Markov models" which, as we will show, are precisely the class of models where such an average exists. These models form Lie algebras and hence concepts from Lie group theory are central to their derivation. In this paper, we concentrate on applications to phylogenetics and nucleotide evolution, and derive the complete hierarchy of Lie Markov models that respect the grouping of nucleotides into purines and pyrimidines-that is, models with purine/pyrimidine symmetry. We also discuss how to handle the subtleties of applying Lie group methods, most naturally defined over the complex field, to the stochastic case of a Markov process, where parameter values are restricted to be real and positive. In particular, we explore the geometric embedding of the cone of stochastic rate matrices within the ambient space of the associated complex Lie algebra.


Assuntos
Modelos Genéticos , Nucleotídeos de Purina/genética , Nucleotídeos de Pirimidina/genética , Animais , DNA/genética , Evolução Molecular , Humanos , Cadeias de Markov , Conceitos Matemáticos , Filogenia , Processos Estocásticos
17.
BMC Evol Biol ; 14: 236, 2014 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-25472897

RESUMO

BACKGROUND: Hadamard conjugation is part of the standard mathematical armoury in the analysis of molecular phylogenetic methods. For group-based models, the approach provides a one-to-one correspondence between the so-called "edge length" and "sequence" spectrum on a phylogenetic tree. The Hadamard conjugation has been used in diverse phylogenetic applications not only for inference but also as an important conceptual tool for thinking about molecular data leading to generalizations beyond strictly tree-like evolutionary modelling. RESULTS: For general group-based models of phylogenetic branching processes, we reformulate the problem of constructing a one-one correspondence between pattern probabilities and edge parameters. This takes a classic result previously shown through use of Fourier analysis and presents it in the language of tensors and group representation theory. This derivation makes it clear why the inversion is possible, because, under their usual definition, group-based models are defined for abelian groups only. CONCLUSION: We provide an inversion of group-based phylogenetic models that can implemented using matrix multiplication between rectangular matrices indexed by ordered-partitions of varying sizes. Our approach provides additional context for the construction of phylogenetic probability distributions on network structures, and highlights the potential limitations of restricting to group-based models in this setting.


Assuntos
Modelos Genéticos , Filogenia , Evolução Biológica , Cadeias de Markov
18.
J Theor Biol ; 327: 88-90, 2013 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-23402954
19.
Syst Biol ; 62(1): 78-92, 2013 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-22914976

RESUMO

In their 2008 and 2009 articles, Sumner and colleagues introduced the "squangles"-a small set of Markov invariants for phylogenetic quartets. The squangles are consistent with the general Markov (GM) model and can be used to infer quartets without the need to explicitly estimate all parameters. As the GM model is inhomogeneous and hence nonstationary, the squangles are expected to perform well compared with standard approaches when there are changes in base composition among species. However, the GM model assumes constant rates across sites, so the squangles should be confounded by data generated with invariant sites or other forms of rate-variation across sites. Here we implement the squangles in a least-squares setting that returns quartets weighted by either confidence or internal edge lengths, and we show how these weighted quartets can be used as input into a variety of supertree and supernetwork methods. For the first time, we quantitatively investigate the robustness of the squangles to breaking of the constant rates-across-sites assumption on both simulated and real data sets; and we suggest a modification that improves the performance of the squangles in the presence of invariant sites. Our conclusion is that the squangles provide a novel tool for phylogenetic estimation that is complementary to methods that explicitly account for rate-variation across sites, but rely on homogeneous-and hence stationary-models.


Assuntos
Classificação/métodos , Modelos Genéticos , Filogenia , Animais , Simulação por Computador , Análise dos Mínimos Quadrados , Mamíferos/classificação , Mamíferos/genética , Cadeias de Markov , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...